On the Performance of Sparse Recovery via 
£p-mininiization (0 < p < 1) 

Meng Wang Weiyu Xu Ao Tang 
School of ECE, Cornell University, Ithaca, NY 14853, USA 



^ Abstract 
CN 

> 

Q It is known that a high-dimensional sparse vector x* in TZ^ can be recovered from low-dimensional 

measurements y = Ax* where A™^"'{m < n) is the measurement matrix. In this paper, we investigate 

the recovering ability of ^p-minimization (0 < p < 1) as p varies, where ^p-minimization returns a 

vector with the least £p "norm" among all the vectors x satisfying Ax = y. Besides analyzing the 

performance of strong recovery where ^p-minimization is required to recover all the sparse vectors 

^ up to certain sparsity, we also for the first time analyze the performance of "weak" recovery of ip- 

O 

I— —I minimization (0 < p < 1) where the aim is to recover all the sparse vectors on one support with fixed 

^-H sign pattern. When a(:= ^) — )• 1, we provide sharp thresholds of the sparsity ratio that differentiates 

\^ the success and failure via ^j,-mininuzation. For strong recovery, the threshold strictly decreases from 

0.5 to 0.239 as p increases from to 1. Surprisingly, for weak recovery, the threshold is 2/3 for all p in 
[0,1), while the threshold is 1 for -minimization. We also explicitly demonstrate that ^p-minimization 
(p < 1) can return a denser solution than -minimization. For any a < 1, we provide boimds of 
sparsity ratio for strong recovery and weak recovery respectively below which £p -minimization succeeds 
with overwhelming probability. Our bound of strong recovery improves on the existing bounds when 
a is large. In particular, regarding the recovery threshold, this paper argues that £p -minimization has 
^ a higher threshold with smaller p for strong recovery; the threshold is the same for all p for sectional 

recovery; and -minimization can outperform ^p-minimization for weak recovery. These are in contrast 
to traditional wisdom that ^p-minimization, though computationally more expensive, always has better 
sparse recovery ability than -minimization since it is closer to -minimization. Finally, we provide an 
intuitive explanation to our findings. Numerical examples are also used to unambiguously confirm and 
illustrate the theoretical predictions. 



I. Introduction 

We consider recovering a vector x in 7^" from an m-dimensional measurement y = Ax, where 
A'"^"(m < n) is the measurement matrix. Obviously, given y and A, Ax. = y is an underdetermined 
Unear system and admits an infinite number of solutions. However, if x is sparse, i.e. it only has a small 
number of nonzero entries compared with its dimension, one can actually recover x from y. This topic 
is known as compressed sensing and draws much attention recently, for example, |[7||[8|p6}p^. 

Given x € TZ"^, its support T is defined as T = {z € {1, n} : Xi ^ 0}. The cardinality |T| of set T 
is the sparsity of x, which also equals to the norm ||x||o := \{i : xi ^ 0}|. We say x is /on-sparse if 
|T| = pn for some p < 1. Given the measurement y and the measurement matrix A, together with the 
assumption that x is sparse, one natural estimate of x is the vector with the least Iq norm that can produce 
the measurement y. Mathematically, to recover x, we solve the following ^o-minimization problem: 

mill ||x||o s.t. Ax. = y. (1) 

However, ([1]) is combinatorial and computationally intractable, and one commonly used approach is to 
solve a closely related -minimization problem: 

min ||x||i s.t. Ax = y, (2) 

where ||x||i := \xi\. (j2]) is a convex problem and can be recast as a linear program, thus can be solved 
efficiently. Conditions under which ([2]) can successfully recover x have been extensively studied in the 
literature of compressed sensing. For example, one widely known sufficient condition is the Restricted 
Isometry Property (RIP) |[§|[7}||8|. 

Among the explosion of research on compressed sensing (|[T||[3|p|p3|p7tp2|p3|), recently, there has 
been great research interest in recovering x by £p-minimization for < ]5 < 1 (||9|| 10|| 12|| 14||22||29 1|2|) 
as follows, 

min ||x|L s.t. Ax = y. (3) 

Recall that ||x||p := for p > 0. Though || • ||p does not actually define a norm as it violates the 

triangular inequality, || • ||p follows the triangular inequality. We say x can be recovered by £p-minimization 
if and only if it is the unique solution to Q. ([3]) is non-convex, and thus it is generally hard to compute 
the global minimum. ||9||10||12| employ heuristic algorithms to compute a local minimum of Q and 
show numerically that these heuristics can indeed recover sparse vectors, and the support size of these 
vectors can be larger than that of the vectors recoverable from -minimization. Then the question is 
what is the relationship between the sparsity of a vector and the successful recovery with -minimization 



(j) < 1)? How sparse should a vector be so that £p-mininiization can recover it? | |25| shows the sparsity 
up to which £p-minimization can successfully recover all the sparse vectors at least does not decrease 



as p decreases. |29| provides a sufficient condition for successful recovery via ^p-minimization based on 
Restricted Isometry Constants and provides a lower bound of the support size up to which ^p-minimization 



can recover all such sparse vectors. |22| improves this bound by considering a generalized version of 
RIP condition, and ||4| numerically calculates this bound. 

Here are the main contributions of this paper. For strong recovery where ^p-minimization needs to 
recover all the vectors up to a certain sparsity, we provide a sharp threshold p*{p) of the ratio of the 
support size to the dimension which differentiates the success and the failure of £p-minimization when 
a(= ^) — )• 1. This is an exact threshold compared with a lower bound of successful recovery in previous 
results. When p increases from to 1, p*{p) decreases from 0.5 to 0.239. This coincides with the intuition 
that the performance of £p-minimization is improved when p decreases. When q < 1 is fixed, we provide 
a positive bound p*{a,p) for all a G (0,1) and all p S (0,1] of strong recovery such that with a 
Gaussian measurement matrix ^p-minimization can recover all the p*{a,p)n-spMse vectors with 

overwhelming probability. p*{a,p) improves on the existing bound in large a region. 

We also analyze the performance of ^p-minimization for weak recovery where we need to recover 
all the sparse vectors on one support with one sign pattern. To the best of our knowledge, there is 
no existing result in this regard for p < 1. We characterize the successful weak recovery through a 
necessary and sufficient condition regarding the null space of the measurement matrix. When a — )• 1, 
we provide a sharp threshold [p) of the ratio of the support size to the dimension which differentiates 
the success and the failure of £p-minimization. The weak threshold indicates that if we would like to 
recover every vector over one support with size less than pjj,(p)n and with one sign pattern, (though the 
support and sign patterns are not known a priori), and we generate a random Gaussian measurement matrix 
independently of the vectors, then with overwhelmingly high probability, £p-minimization will recover all 
such vectors regardless of the amplitudes of the entries of a vector. For £i -minimization, given a vector, if 
we randomly generate a Gaussian matrix and apply £i -minimization, then its recovering ability observed 



in simulation exactly captures the weak recovery threshold, see |15||16|. Interestingly, we prove that the 
weak threshold p^{p) is 2/3 for all p G [0, 1), and is lower than the weak threshold of £i -minimization, 
which is 1. Therefore, £i -minimization outperforms £p-minimization for all p G [0, 1) if we only need to 
recover sparse vectors on one support with one sign pattern. We also explicitly show that £p-minimization 
(j) G (0, 1)) can return a vector denser than the original sparse vector while £i -minimization successfully 
recovers the sparse vector. Finally, for every a < 1, we provide a positive bound p*^{a,p) such that 



£p-minimization successfully recovers all the /9^(a,p)n-sparse vectors on one support with one sign 
pattern. 

The rest of the paper is organized as follows. We introduce the null space condition of successful 
^p-minimization in Section |ll] We especially define the successful weak recovery for p < 1 and provide 
a necessary and sufficient condition. We use an example to illustrate that the solution of -minimization 



can be sparser than that of £p-minimization (p G (0, 1)). Section III provides thresholds of the sparsity 



ratio of the successful recovery via ^p-minimization for all p € [0, 1] both in strong recovery and in weak 



recovery when the measurement matrix is random Gaussian matrix and a — 1. For a < 1, Section IV 
provides bounds of sparsity ratio below which ^p-minimization is successful in the strong sense and in the 
weak sense respectively. We compare the performance of £p-minimization (p < 1) and the performance 
of ^1 -minimization in Section |V] and provide numerical results in Section VI Section VII concludes the 
paper. 

II. Successful Recovery of £p-minimization 

We first introduce the null space characterization of the measurement matrix A to capture the suc- 
cessful recovery via £p -minimization (p £ [0, 1]). Besides the strong recovery that has been studied in 
||4|| 13 1|22||23 1|25 1|29 1|31 1, we especially provide a necessary and sufficient condition for the success 



of weak recovery in the sense that ^p-minimization only needs to recover all the sparse vectors on one 
support with one sign pattern. For example, in practice, given an unknown vector to recover, we randomly 
generate a measurement matrix and solve the -minimization problem, the simulation result of recovery 
performance with respect to the sparsity of the vector indeed represents the performance of weak recovery. 

Given a measurement matrix A"^^^, let i5"x("-™) denote a basis of the null space of A, then we 
have AB = 0. Let Bi (i G {1, ■■■,n}) denote the z"^ row of B. Let Bt denote the submatrix of B with 
T C {1, ...,n} as the set of row indices. In this paper, we will study the sparse recovery property of 
^p-minimization by analyzing the null space of A. 

We first state the null space condition for the success of strong recovery via £p-minimization (|2T||25|) 
in the sense that £p-minimization should recover all the sparse vectors up to a certain sparsity. 



Theorem 1 (|21 1|25|). x is the unique solution to Ip-minimization problem (0 < p < 1) for every vector 



X up to pn-sparse if and only if 



for every non-zero z G TZ^ ™, and every support T with \T\ < pn. 



One important property is that if the condition (|4]) is satisfied for some < p < 1, then it is also 



satisfied for all q G [0,j?] (| 14||26|). Therefore, if ^p-minimization could recover all the pn-sparse vectors 
X, then £q -minimization (0 < g < p) could also recover all the pn-sparse vectors. Intuitively, the strong 
recovery performance of ^^-minimization should be at least as good as that of ^p-minimization when 
< g < p < 1. 

A. Weak recovery for Ip-minimization 

Though ^p-minimization (p < 1) should be at least as good as -minimization for strong recovery, 
the argument may not be true for weak recovery. 

We first state the null space condition for successful weak recovery via £i -minimization as follows. 



(see 1 19 1125 1130 1134 1136 1 for this result.) 



Theorem 2. For every x G TZ^ on some support T with the same sign pattern, x is always the unique 
solution to £ I -minimization problem ([2]) if and only if 

\\Bj-z\\i < Ili^T'^zlli + ||i?7-+z||i (5) 

holds for all non-zero z G 7^"~™ where T~ = {i £ T : BiZXi < 0}, and = {i £ T : BiZXi > 0} 

Note that for every vector x on a fixed support T with a fixed sign pattern, the condition to successfully 
recover it via -minimization is the same, as stated in Theorem |2] However, the condition of successful 
recovery via £p-minimization (0 < p < 1) varies for different sparse vectors even if they have the same 
support and the same sign pattern. In other words, the recovery condition depends on the amplitudes 
of the entries of the vector. Here we consider the worst case scenario for weak recovery in the sense 
that the recovery via £p-minimization is defined to be "successful" if it can recover all the vectors on a 
fixed support with a fixed sign pattern. The null space condition for weak recovery in this definition via 
^1 -minimization is still the same as that in Theorem [2] We characterize the £p-minimization (p G (0, 1)) 
case in Theorem |3] and the -minimization case in Theorem |4] 

Theorem 3. Given any p G (0, 1), for all x G TZ^ on some support T with some fixed sign pattern, x is 
always the unique solution to ip-minimization problem Q, if and only if the following condition holds: 

\\Bt-AI ^ WBt^AI (6) 

for all non-zero z G TZ"'~"^ where = {i G T : BiZXi < 0}; moreover, if Bt+z = where 



r+ = {i £ T : BiZXi > 0}, it further holds that 



\\Bt-z\\1<\\BtM\1. (7) 

Proof: Necessary part. Suppose the condition fails for some z, then there are two cases: either 
Bt+t, = or Bt+z / 0. 

First consider the case B^+z = 0, then we have ||i?T-z||p > ||ST<=z||p. Define a vector x as follows. 
Let Xi = for every i in T'^, let Xi = —BiZ for every i in T~. Let Xj be any value with the fixed sign 
for every i in T+. Then according to the definition of x, we have 

||x + 5z||P 



|xt- + Bt-z\\p + ||xt+ + Bt+z\\1 + II^TczllP 



p 

= + ||xr+||^+ ||St=z||p 
= ||x||P-||xT-||P+||i3rez||P 
= MP-\\Bt-z\\p + \\Bt^z\\p 

Since ||x + Bz\\p < ||x||p, (jsjl cannot successfully recover x, which is a contradiction. 

Secondly, consider the case B^+z / 0. Then HSr-zHp > ||i?r<=z||p. Let 5 = ||i?r-z||p— ||i?Tcz||p > 0. 
Define a vector x as follows. Let = for every i in T'^, let Xi = —BiZ for every i in . For every i 
in T+, since p € (0, 1), we can pick Xi with \xi\ large enough such that ||xj'+ + Bt+z\\p — llxy+Hp < |. 
Then 

||x + 5z||P = + ||xT+ +St+z||p + ||Stcz||p 

< l|xT+||^+2 + pT=z||P 



IxllP 



6_ 

P 2' 



Thus ||x + Bz\\p < ||x||p, X is not a solution to ([3|l, which is also a contradiction. 

Sufficient part. Assume the null space condition holds, then for any x on support T with fixed signs. 



and any non-zero z G 7^" ™, we have 

||x + Sz||P 

= ||XT+ + Bt+2\\1 + ||XT- + -Bt-z||^ + ||StcZ||P 

> \\^t^+BtM\1 + \\^t-\\1-\\Bt-AI + \\BtM\1, (8) 

where the inequaUty follows from the triangular property that |xj + i?jz|P > |xj|'' — \BiZ.\'P holds for all 
i and all p G (0, 1). 

If Bt+z / 0, then ||xy+ + Bx+z\\p > ||x2-+||p since BiZ / for some i, and SjZ and Xj have 
the same sign. Since we also have HBy-zHp < ||St^z||p, therefore ^> ||x||p. If B^+z = 0, then 
||-Br-z||p < ||i?T<^z||p from assumption, therefore we also have ([8])> ||x||p. Thus, ||x + i?z||p > ||x||p for 
all non-zero z G 7^"""*, then x is the solution to ■ 

Similarly, the null space condition for the weak recovery of ^Q-minimization is as follows, we skip its 
proof as it is similar to that of Theorem [3] 

Theorem 4. For all x G IZ^ on one support T with the same sign pattern, x is always the unique 
solution to io-minimization problem ([7]), if and only if 

\\Bt-z\\q < \\Bt^z\\o (9) 

for all non-zero z G 7^"~™ where T~ = {z G T : BiZXi < 0}. 

For the strong recovery, the null space conditions of -minimization and £p-minimization (0 < p < 1) 
share the same form Q, and if (Q holds for some p < 1, it also holds for all q G [0,p]- However, for 
recovery of sparse vectors on one support with one sign pattern, from Theorem |2] [3] and |4j we know that 
although the conditions of -minimization (0 < p < 1) and ^o-minimization share a similar form in ([6]), 
(jV]) and (j9]l, the condition of £i -minimization has a very different form in (j5]l. Moreover, if ([6]) holds for 
some p G (0, 1), it does not necessarily hold for some q G (0,p). Therefore the way that the performance 
of weak recovery changes over p may be quite different from the way that the performance of strong 
recovery changes over p. Moreover, the performance of weak recovery of ii may be significantly different 
from that of £p -minimization for p G (0, 1). We will further discuss this issue. 

B. The solution of ii-minimization can be sparser than that of ip-minimization (p £ (0, l)j 

£p-minimization (p G (0, 1)) may not perform as well as -minimization in some cases, for example 



in the weak recovery which we will discuss in Section III and Section IV Here we employ a numerical 



example to illustrate that in certain cases -minimization can recover the sparse vector while Ip- 
minimization {p G (0, 1)) cannot, and the solution of ^p-minimization is denser than the original sparse 
vector. 

Example 1. ^p-minimization returns a denser solution than -minimization. 

Let the measurement matrix ^ be a (6A; — 1) x 6A; matrix with (3 G TZ^^ as a basis of its null space, and 
I3i = IforalH G {l,...,k],pi = -1 for alH G + 2A;}, and /3i = 1/64 for alH G {2A; + 1, 6A;}. 
According to Theorem [ij one can calculate that £i -minimization can recover all the ([||A:] — 1) -sparse 
vectors in TZ^^, and £o.5-i™nimization can recover all the {\\k'\ — l)-sparse vectors in Vf'^. Therefore, 
in terms of strong recovery, ^o.s-minimization has a better performance than -minimization as it can 
recover all the vectors up to a higher sparsity. 

Now consider the "weak" recovery as to recover all the nonnegative vectors on support T = {1, 2k}. 
According to Theorem |2] and Theorem [3] one can check that -minimization can indeed recover all 
the nonnegative vectors on support T, however, ^o.s-minimization fails to recover some vectors in this 
case. For example, consider a 2fc-sparse vector x* with = 9 for all i G {!,..., /c}, x* = 1 for all 
i G {A; + 1, 2k}, and = for all i G {2k + 1, Qk}. One can check that among all the vectors 
X = X* + /i/3, V/i G TZ, which are the solutions to Ax = Ax* , x* has the least li norm, therefore x* is the 
solution to (|2]) and can be successfully recovered via -minimization. Now consider £o.5-nunimization, 
we have ||x*||q5 = Ak. Consider the nonnegative 5/c-sparse vector x' = x* + /3 with x[ = 10 for all 
i G {1,..., fc}, x[ = for all i G {A; + 1, ...,2k}, and x[ = 1/64 for all i G {2A; + 1, ...,Qk}. We have 
Ax! = Ax*, and one can check that ||x'||[}| = (-/lO + 0.5)A; < ||x*||j]| for all k>2. Moreover, with a 
little calculation one can prove that x' is indeed the solution to Thus, the solution of ^0.5 -minimization 
is a Sfc-sparse vector although the original vector x* is only 2A;-sparse. Therefore ^0.5 -minimization fails 
to recover some nonnegative 2A;-sparse vector x* while x* is the solution to £1 -minimization, and the 
solution of ^0.5 -minimization is denser than the original vector x*. 

III. Recovery thresholds when lim„^oo ^ — ^ 1 
In this paper we focus on the case that each entry of the measurement matrix A is drawn from standard 
Gaussian distribution. Since A has i.i.d. AA(0, 1) entries, the null space of A is rotationally invariant, thus 
there exists a basis ^"x("-»t*) of the null space of A such that AB = and B has i.i.d. A/'(0, 1) entries, 
please refer to |[8|p5| for details. 



We first focus on the case that a = ^ — )• 1 and provide recovery thresholds of ^p-minimization for every 
p G [0, 1]. we consider two types of thresholds: one in the strong sense as we require ^p -minimization to 



recover all pn-sparse vectors (Section III-A i, one in the weak sense as we only require ^p-minimization 



to recover all the vectors on a certain support with a certain sign pattern (Section III-B i. We call it a 
threshold as for any sparsity below that threshold, £p-minimization can recover all the sparse vectors either 
in the strong sense or the weak sense, and for any sparsity above that threshold, ^p-minimization fails to 
recover some sparse vector. These thresholds can be viewed as the limiting behavior of ^p-minimization, 
since for any constant a < 1, the recovery thresholds of £p -minimization would be no greater than the 
ones provided here. 



A. Strong Recovery 

In this section, for given p, when a — 1, we shall provide a threshold p* (p) for strong recovery such that 
for any p < p*{p), £p-minimization (j3]l can recover all pn-sparse vectors x with overwhelming probability. 
Our technique here stems from pOl, which only focuses on the strong recovery of -minimization. 

We have already discussed in Section [n] that the performance of -minimization should be no worse 
than ^p-minimization for strong recovery when < g < p < 1. Although there are results about bound 
of the sparsity below which £p-minimization can recover all the sparse vectors, no existing result has 
explicitly calculated the recovery threshold of £p-minimization for p < 1 which differentiates the success 
and failure of £p-minimization. To this end, we will first define p*{p) in the following lemma, and then 
prove that p* (p) is indeed the threshold of strong recovery in later part. 

Lemma 1. Let Xi, X2,...,Xn be i.i.d M{0, 1) random variables and let Yi, Y2,...,Yn be the sorted 
ordering (in non-increasing order) of \Xi\P, \X2[^,...,\Xn\^ for some p G (0, 1]. For a p > 0, define Sp 

[pn] 

as ^i- S denote E[Si\, the expected value of Si. Then there exists a constant p*{p) such that 

Proof: Let X ~ 7^(0,1) and let Z = \X\. Let f{z) and F{z) denote the p.d.f. and c.d.f. of Z 
respectively. Then 

f{z) = .j2l^e-\'\ ifz>0, 

= 0, if z < 0. (10) 

F(z) = erf(z/V2) = [ y/^Jne'^'^'dx, if z > 0, 

= 0, ifz<0. (11) 



Define g{t) = zPf{z)dz. g is continuous and decreasing in [0,oo], and g{0) = E\Z^\ 
limj_^oo = 0. Then there exists z* such that 5(2:*) 



s 

n' 



9(0) 



, I.e. 



f{x)dx 



x^f{x)dx = 0. 



Define 



p* = 1- F{z*) 



(12) 



(13) 



We claim p* has the desired property. 

Let Tt = Y.i:Y,>tp^i- Then -E[Tj.] = ng{z*). Since -E[|Tj. - is bounded by 0{^/n), and 
S = ng{0), thus lim^^oo = i 



Proposition 1. The function p*{p) is strictly decreasing in p on (0, 1]. 



Proof: From the definition of z* in (12i, we have 



/■Z* PCX 

H{z*,p) := / xPf{x)dx- 

Jo Jz' 



x^f{x)dx = 0, 



where /(•) and F{-) are defined in (10 1 and (111. From the Implicit Function Theorem, 



dz* ^ xP (Inx) f{x)dx — xP{ln x)f{x)dx 

dp 



dH 

dz' 



2z*Pf{z*] 



From (|13|l, we have ^ = -f{z*). From the chain rule, we know ^ = thus 



dp* xP{lnx) f{x)dx — xP (In x)f{x)dx 

dp 



Note that 



2z*P 



xP {In x)f{x)dx < / xP{ln z*)f{x)dx 
Jo 

00 

a;^(ln z*)f{x)dx 



< 



xP(lnx)f{x)dx, 



(14) 



(15) 



(16) 



where the equality follows from ( 14 1. Then the numerator of ( 15 1 is less than from ( 16 1, thus ^ < 



We plot p* against p numerically in Fig.[T] p*{p) goes to | as ^» tends to zero. Note that p*{l) = 0.239..., 
which coincides with the result in ||20l. 

Now we proceed to prove that p* is the threshold of successful recovery with £p minimization for p 
in (0, 1]. First we state the concentration property of Sp in the following lemma. 



Q. 
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Fig. 1. Threshold p* of successful recovery with -minimization 



Lemma 2. For any p £ (0, 1], let Xi,...,Xn, Y\,...,Yn, Sp and S be as above. For any p > and 
any 6 > 0, there exists a constant ci > such that when n is large enough, with probability at least 

l_2e-cin \Sp-E[Sp]\ < 6S. 

Proof: Let X = [Xi, If two vectors X and X' only differ in co-ordinate i, then for any p, 

\Sp{X) - 5p(X')| < \ \Xi\P - \Xl\P\. Thus for any X and X', 



|5,(X)-S,(X')|< Il^^l'-I^^ 
Since \\Xi\P - < \Xi - X[\p for all p G (0, 1], 



'\v\ 



|Sp(x) - 5p(x')| < l^i - K?- (17) 



From the isoperimetric inequality for the Gaussian measure |28|, for any set A with measure at least 
a half, the set = {x G 7^" : (i(x, ^) < t) has measure at least 1 — e^*^/^, where (i(x, A) = 
infyg^ ||x — y||2. Let Mp be the median value of Sp = Sp{X). Define set yl = {x € 7^" : 5'p(x) < Mp}, 
then 

P{d{^,A) <t)> 

We claim that d(x, A) < t impUes that S'p(x) < Mp + rS^-'P/'^h'P . If x G ^, then S'p(x) < Mp, thus the 
claim holds as n^^^/^t^ is nonnegative. If x ^ A, then there exists x' G A such that ||x — x'||2 < t. Let 



Ui = I for all i and let vi = \xi — x^\p. From Holder's inequality, 

l-p/2 



/ \ l-p/2 / \ p/2 



< n(i-P/2)(t2)P/2 = ^(i-p/2)^P 



(18) 



From (171 and (18 1, \Sp{x.) - 5p(x')| < n'^^'P/^hP. Since x ^ A and x' G A, then 5p(x) > > 



S'p(x'). Thus S'p(x) < Mp + n^^-P/'^hP, which verifies our claim. Then 

P(5p(x) < Mp + n(i-f/2)tP) > P(d(x,yl) < t) > 1 - e-*'/2. 

Similarly, 

P(Sp(x) > A/p - n(i-P/2)tP) > 1 _ e-*V2_ 



Combining (19 1 and (20 1, 



P{\Sp{x) - Mp\ > n^^-Pl^hP) < 2e-*'/2. 
The difference of E[Sp] and Mp can be bounded as follows, 

\E[Sp]-Mp\ < E[\Sp-Mp\] 

P{\Sp{x) - Mp\ > y)dy 



< 







1 - (1--) 



oo 



dy 



Jo 



(19) 



(20) 



(21) 



Note that c := /^^ 2e-t^'""ds is a finite constant for all p G (0, 1]. As p > and S = nE\\xi\P], 
thus for any 5 > 0, cn^^^a) < when n is large enough. 

Lett= = (i(5S[|xi|P])^V^, from (21 1 with probability at least 1 -2e-2(5'5-E[k.|''])^«, 



\Sp- Mp\ < l5S. Thus \Sp- E[Sp]\ < \Sp - Mp\ + \Mp - E[Sp]\ < 5S with probability at least 

1 — 2e~'^i" for some constant ci. ■ 

Corollary 1. For any p < p*, there exists a 6 > Q and a constant C2 > such that when n is large 
enough, with probability at least 1 — 2e~'^^", Sp < — 6)S. 



Proof: When p < p*, 



\p'n] 

E[Sp] = E[Sp.]- Yl Eil^^n 

i=\pn\+l 

< E[Sp^]-{\p*n-\-\pn])E[\X,\P] 



Then E[Sp]/S < ^ — 25 for a suitable S as S = nE[\Xi\P]. The result follows by combining the above 
with Lemma |2] ■ 

Corollary 2. For any e > 0, there exists a constant C3 > such that when n is large enough, with 
probability at least 1 - 2e"''^", it holds that (1 - e)S < 5i < (1 + e)S. 

The above two corollaries indicate that with overwhelming probability the sum of the largest \pri\ 
terms of l^'s is less than half of the total sum Si if p < p* . The following lemma extends the result 
to every vector Bt. where matrix has i.i.d. Gaussian entries and z is any non-zero vector in 

Lemma 3. For any < p < 1, given any p < p*{p), there exist constants < C4 < 1, C5 > 0, 5 > 
such that when a = ™ > C4 and n is large enough, with probability at least 1 — e~^^^, an n x (n — m) 
matrix B with i.i.d. AA(0, 1) entries has the following property: for every non-zero z € 7^"^"* and every 
subset T C {1, n} with \T\ < pn, ||i?7-cz||p — ||i?rz||p > (55||z||2. 



Proof: For any given 7 > 0, there exists a 7-net S in 7^" ™ of cardinality less than (1 + ^)" ™(|28|). 



2 \n—ra 

A 7-net S is a set of points in 7^"^™ such that ||v'^||2 = 1 for all in S and for any z € TZ"'~"^ with 
||z||2 = 1, there exists some v'^ such that ||z — v'^||2 < 7. 

Since B has i.i.d M{0, 1) entries, then Bv^ has n i.i.d. M{0, 1) entries for every v*^. From Corollary 
[T]and[2| we know that given any p < p*, for some 6 > and for every e > 0, there exists C2 > and 
C3 such that with probability at least 1 — 2e~'^^" — 26"^^^", we have 

SpiAv'') < - 5)S (22) 

and 

{l-e)S <Si{Av'')<{l + e)S (23) 

both hold for a vector v'^ in S. Then applying union bound, we know that (22 1 and (23 1 hold for all 
vectors in S with probability at least 

1 - (1 + 2/7)^^-"^ (26-"^^" + 26-"^^"). (24) 



Let a = m/n, then as long as a > C4 := 1 — , then (24 1> 1 — e for some constant C5 > 0. 



For any z such that ||z||2 = 1, there exists vq in S such that ||z — V0II2 — 71 < 7- Let zi denote 
z — Vq, then ||zi — 71V1II2 = 72 < 7i7 < 7^ for some vi in S. Repeating this process, we have 

2 = ^j^i ^^^^ 



where 70 = 1, 7^ < 7-' and Vj G S. Thus for any z G 7^" ™, we have z = ||z||2 X]j>o^i^J- 
For any index set T with |r| < pn, 

WBtAI = MI\\Y.^3Bt^3\\1 

< llzllf J]7^'n|i?TV,||^ 

j>0 

. cii IIP 1-2^ 



\BAl = MlWY^l.B^X 

> \\z\mBvor,-^^^\\B^rX) 

> ||z||^(||i?vo||^- J]7^'^pv,-||p 

i>i 

> \\z\\liil-e)S-^j^^{l + e)S) 



> 5||z 



.p l-2Y-e 
I2 i_^p 



Thus ||i?r'=z||p — ||i?rz||p > S'UzHg^^y^^^— ^. For a given 6, we can pick 7 and e small enough such 
that \\Bt^z\\p - WBrzWf, > 6S\\z\\l. ■ 
We can now establish one main result regarding the threshold of successful recovery via £p-minimization. 

Theorem 5. For any < p < 1, given any p < p* (p), there exist constants < C4 < 1, C5 > such that 
when a > C4 and n is large enough, with probability at least 1 — e~^^"', an m x n matrix A with i.i.d. 
M{0, 1) entries has the following property: for every x G TZ^ with its support T satisfying \T\ < pn, x 
is the unique solution to the ip-minimization problem 

Proof: Lemma |3| indicates that XlieT'^ \{Bz)i\P - EieT > 5S\\z\\l > for every non-zero 

z, then from Theorem [T| x is the unique solution to the ^p-minimization problem ([3]). ■ 
We remark here that p* is a sharp bound for successful recovery. For any p > p*, from Lemma |2j with 
overwhelming probability the sum of the largest [pn] terms of \Biz\P's is more than the half of the total 
sum Si, i.e. the null space condition stated in Theorem [T] for successful recovery via £p-minimization 
fails with overwhelming probability. Therefore, £p-minimization fails to recover some pn-sparse vector 
with overwhelming probability. Proposition [T] implies that the threshold strictly decreases as p increases. 



The performance of -minimization is better than that of -minimization for < pi < p2 < 1 as 
-minimization can recover vectors up to a higher sparsity. 



B. Weak Recovery 



We have demonstrated in Section III-A that the threshold for strong recovery strictly decreases as p 
increases from to 1. Here we provide a weak recovery threshold for all p G [0, 1) when a — )• 1. As we 
shall see, for weak recovery, the threshold of £p-minimization is the same for all p G [0, 1), and is lower 
than the threshold of -minimization. 

Recall that for successful weak recovery, ^p-minimization should recover all the vectors on some fixed 
support with a fixed sign pattern, and the equivalent null space characterization is stated in Theorem [3] 
and Theorem |4l 

We define = 1 for all x ^ 0, and O'^ = 0. To characterize the recovery threshold of ^^-minimization 
in this case, we first state the following lemma. 

Lemma 4. Let Xi, X2,...,Xn be i.i.d. AA(0, 1) random variables and T be a set of indices with size 
\T\ = pn for some p > 0. Let x € 7^" be any vector on support T with fixed sign pattern. For every 
p G [0, 1), for every e > 0, when n is large enough, with probability at least 1 — e~'^^'^ for some constant 
cg > 0, the following two properties hold simultaneously: 

• 5P^(/^ - e) < EiGT:X,x,<0 l^il'" < \P^(1^ + e) 

. (1 - p)n{p - e) < EieT= l^d^ < (1 " P)n{n + e). 
where p = E[\X\p], X ~ AA(0, 1). 

Proof: Define a random variable Si for each i in T that is equal to 1 if XiXi < and equal to 
otherwise. Then Z]jeT:XiZi<o 

every i in T as Xj ~ A/'(0, 1). 
From the Chernoff bound, for any e > 0, there exist di > and ^2 > such that 

^EieT < \P<1^ - e)] < e-''^", 

Again from the Chernoff bound, there exist some constants 1^3 > 0, ^4 > such that 

P[Z^eT^ \X^\^ < (1 - PMl-i - e)] < e-'^^", 

PiEi^T^ > (1 - pMp + e)] < e-'^-". 
By union bound, there exists some constant cq > such that the two properties stated in the lemma hold 
at the same time with probability at least 1 — e"'^'^". 



Lemma |4] implies that J2ieT-Xx<o\-^i\^ ^ SieT-^ holds with high probability when |T| 



pn < |n. Applying the similar net argument in Section III- A we can extend the result to every vector 



Bz where matrix ij"^x("-™) has i.i.d. Gaussian entries and z is any non-zero vector in 7^"^™. Then we 
can establish the main result regarding the threshold of successful recovery with -minimization from 
vectors on one support with the same sign pattern. 

Theorem 6. For any p G [0, 1), given any p < p*^^ := |, there exist constants cj G (0, 1), cg > such 
that when a > cj and n is large enough, with probability at least 1 — e"^**", an nix n matrix A with i.i.d. 
M{0, 1) entries has the following property: for every vector x on some support T satisfying \T\ < pm 
with fixed sign pattern on T, x is the unique solution to the ip-minimization problem. 

Proof: From Lemma |4j applying similar arguments in the proof of Lemma |3] we get that when 
a > cj for some < < 1 and n is large enough, with probability 1 — e"'^'*" for some > 0, 

• 5P"-(/^ - e) < EieT:(B,v)x,<o < ^pn{p + e) 

. (1 - p)n{p -e)< ZieT^ \B^^r\P < (1 - p)n{p + e) 
hold for all the vectors v in a 7-net S at the same time. Let S be the unit sphere in 7^"""^. Pick any 



z G 5, from (25 1 we have z = Xli^o where 70 = 1, Vj G S for all j and < 



Given z, let = {i G T : BiZXi < 0}. For any i in T", 
\Biz\P = |J]7,-Bivl^ 



1 1 
i>o 

p 



j:{B,v,)x,<0 
j:(B.v,)x.<0 



J I 



where the first inequaUty holds as {Biz)xi < 0. Then 



We also have 



< 



< 



ieT- j:{B,Vj)xi<0 

E E 

ieT j:{BiVj)x,<0 
j>0 ieT:(B,Vj)xi<0 



j>0 



J>1 



> (l-/))n(/x-e)- J]7^P(l-p)n(/i + e) 
/i - 2fiY - e 



1 -7P 



(26) 
(27) 



(28) 



|i?T-z||^> ||Z||^^(1 



Combining (27) and (28 1, we have for every z G 5, ||i3Tcz||p 
p) — ^(1 — 0). Then for every non-zero z € 7^"~™, we have ||St<=z| 
|p — 27*'(1 — p) — ^(1 — ^)). For any p < |, we can pick 7 and e small enough such that the righthand 
side is positive. The result follows by applying Theorem |3] and Theorem |4] 



We remark here that is a sharp bound for successful recovery in this setup. For any p > p^, from 
Lemma |4j with overwhelming probability that XlieT X /i <o 1-^*1^ > SieT^: then Theorem |3] and 

Theorem [4] indicate that the £p-minimization (p € [0, 1)) fails to recover some pn-sparse vector x in 
this case. Note that for a random Gaussian measurement matrix, from symmetry one can check that this 
results does not depend on the specific choice of support and sign pattern. In fact, Theorem [6] holds for 
any fixed support and any fixed sign pattern. 

Surprisingly, the successful recovery threshold pj^ when we only consider recovering vectors on one 
support with one sign pattern is | for all p in [0, 1) and is strictly less than the threshold for p = 1, 



which is 1 (1 15 1). Thus in this case, ^1 -minimization has better recovery performance than ^p-minimization 



(j) G [0, 1)) in terms of the sparsity requirement for the sparse vector. If we view the ability to recover 
all the vectors up to certain sparsity as the "worst" case performance, and the ability to recovery all the 



sparse vectors on one support with one sign pattern as the "expected" case performance, then although 
worst case performance can be improved if we apply ^p-minimization with a smaller p, £i -minimization 
in fact has the best expected case performance for all p £ [0, 1]. 

It might be counterintuitive at first sight to see that the weak threshold of ^o-minimization is less than 
that of ^1 -minimization, so let us take a moment to consider what the result means. We choose recovering 
all nonnegative vectors on some support T (|T| = pn) for the weak recovery, the argument follows for 
all the other supports and all the other sign patterns. The results about weak recovery threshold indicate 
that for any p G (2/3, 1), when n is sufficiently large and a — t- 1, for a random Gaussian measurement 
matrix A, -minimization would recover all the nonnegative vectors on some support T (|T| = pn) with 
overwhelming probability, while ^Q-minimization would fail to recover some nonnegative vector on T 
with overwhelming probability according to Theorem [6] This can happen when there exists a nonnegative 
vector X on support T and a vector x' on support T' such that \T'\ < \T\, and ylx = Ax.'. Note that 
x' could have negative entries, or T' may not be a subset of T. Therefore, if x is the sparse vector we 
would like to recover from ^x, ^o-minimization would fail since ||x'||o < ||x||o. However, ||x||i < ||x'||i 
should hold since -minimization can successfully return x as its solution. Of course when x' is the 
sparse vector we would like to recover, £i -minimization would return x and fail to recover x'. However, 
since £i -minimization would recover all the nonnegative vectors on T, then either T' (^T holds or x' has 
negative entries. Therefore when we consider recovering nonnegative vectors on T for the weak recovery, 
x' is not taken into account, and £i -minimization works better than ^o-minimization. Therefore, although 
the performance of -minimization is not as good as that of £p-minimization (p G [0, 1)) in the strong 
recovery which requires to recover all the vectors up to certain sparsity, -minimization can recover 
all the /on-sparse (p > 2/3) vectors on some support with some sign pattern, while for ^p-minimization 
{p G [0, 1)), the size of the largest support on which it can recover all the vectors with one sign pattern is 
no greater than 2n/3. Thus, when we aim to recover all the vectors up to certain sparsity, £p-minimization 
is better for smaller p, however, when we aim to recover all the vectors on one support with one sign 
pattern, -minimization may have a better performance. 

IV. Recovery Bounds for Every lim„^oo ^ < 1 



We considered the limiting case that a — )• 1 in Section III and provided the limiting thresholds of 
sparsity ratio for successful recovery via ^p-minimization both in the strong sense and in the weak sense. 
Here we focus on the case that a is given (0 < a < 1). For any a and p, we will provide a bound p*{a,p) 
for strong recovery and a bound p'^{a,p) for weak recovery such that £p-minimization can recover all the 



/9*(a,p)n-sparse vectors with overwhelming probability, and recover all the /9^(a,p)n-sparse vectors on 
one support with one sign pattern with overwhelming probability. Note that the thresholds we provided 



in Section III is tight in the sense that for any p > p* in the strong recovery or any p > in the weak 
recovery, with overwhelming probability ^p-minimization would fail to recover some pn sparse vector. 
However, p*{a,p) and p*^{a,p) we provide in this section are lower bounds for the thresholds of strong 
recovery and weak recovery respectively, and might not be tight in general. 

A. Strong Recovery 



As discussed in Section III since A has i.i.d. AA(0, 1) entries, there exists a basis B of the null space of 
A with i.i.d. AA(0, 1) entries. Let S be the unit sphere in 7^"~"*. From Theorem [T] we know that in order 
to successfully recover all the pn-sparse vectors via ^p-minimization, ||Stz||p < i||i?z||p should hold 
for every non-zero vector z G 7^", and every set T C {1, n} with \T\ < pn. We will first establish a 
lower bound of ||-Bz||p for all z G 5 with overwhelming probability in Lemma [s] Lemma |6] establishes 
the fact that for any given constant c > 0, there always exists some p > such that ||Stz||p < cn for 
all z G 5 and all T with |r| < pn with overwhelming probability. Combining Lemma |5] and Lemma 
|6] we will establish a positive lower bound p*{a,p) of sparsity ratio for successful recovery for every 
a G (0, 1) and every p G (0, 1] in Theorem |7] 

Lemma 5. For any a and p, there exists a constant Amin(ci,p) > and some constant cg > such that 
with probability at least 1 — e~^^", for every z G 5, ||-Bz||p > \^i^[a,p)n. 

Lemma 6. Given any a, p and corresponding \rain{o-,p) > 0, there exists a constant p*{a,p) > and 
some constant cio > such that with probability at least 1 — e~^^°^, for every z G 5 and for every set 
T C {1, 2, m} with \T\ < p*{a,p)m, \\Btz\\p < ^Xram{a,p)n. 

We defer the proofs of Lemma |5] and Lemma |6] for later discussion, and first present our result on 
bounds for strong recovery of £p -minimization with given a G (0, 1). 

Theorem 7. For any < p < 1, for matrix A"^^"^ (a = ^) with i.i.d AA(0, 1) entries, there exists 
a constant cn > such that with probability at least 1 — e"'^"", x is the unique solution to the ip- 
minimization problem pi) /or every vector x up to p*{a,p)n-sparse. 



Proof: Let S be the unit sphere in 7^" Then 



P(Strong recovery succeeds to recover vectors up to p*(a,p)n-sparse) 

1 

2 



P(V non-zero z e Te""™, VP with |r| = p*(a,p)n, ||Ptz||^ < ^ pzliP 



= P(Vz e cS, VP with |r| = p*{a,p)n, \\Btz\\p < ^||Pz||p 

> P(Vz € 5, VP with |P| = p*{a,p)n, \\Btz\\p < ^Amin(a,p)n, and ||Pz||^J > Amin(a,p)n) 

> 1 - P(3z G cS, s.t. \\Bz\\P < Amin(a,p)n) 

-P(3z G 5,3P with |P| = p*{a,p)n s.t. ||Prz||^ > Amin(a,p)ra/2) 
= 1 - e-'=«" - e"^^"", (29) 

where the first equaUty follows from Theorem [1] the second equality holds since for any non-zero 
z g 7^"-™, z/||z||2 G 5. From Lemmajs] we know there exists cg > such that P(3z G S, s.t. ||Pz||p < 
^mm{ci,p)n) < e'^^"", and from Lemma|6]we know there exists cio > such that P(3z G S, 3T s.t. ||Ptz|| 



h^mm{ci,p)'n) < e then there exists cn > which depends on a, p and Amin such that (29 1 



> 1 — e Therefore, ^p-minimization can recover all the /9*(a,p)n-sparse vectors with probabiUty 



Theorems [t] implies that for every a G (0,1) and every p G (0,1], there exists a positive constant 
p*{a,p) such that ^p-minimization can recover all the p*n-sparse vectors with overwhelming probability. 
Since p*{a,p) is a lower bound of the threshold of the strong recovery, we want it to be as high as 
possible. Next we show how to calculate p*{a,p) and improve it as much as possible. In order to 
calculate p*{a,p), we first calculate Xmin{o^,p) in Lemma |5] and then with the obtained X^in{ci,p), we 
can calculate p*{a,p) in Lemma [6] We want to obtain Amin (a, p) which is as large as possible while 
Lemma [5] still holds, and given Xmm{a,p), we want p*{a,p) to be as large as possible while Lemma [6] 
still holds. How to calculate Amin (a, p) and p*{a,p) is stated in the following text, and Lemma |5] and 
Lemma[6]are proved in the meantime. The values of Amin(",p) and p*{a,p) can be computed from ([38]) 
and (|43). 

1) Calculation o/Amin(a,p) in Lemma^ 
Given a and p, define 

Cmax = - sup \\Bz\\l = - max ||Pz||^, 
n n zes ^ 

where the second equality holds by compactness. Thus, for any non-zero vector z, ||Pz||p < ||z||pCmax?T^- 



Define 

1 



min ||i?z||J'. 
n ze5 ^ 



Pick a 7-net S2 of S with cardinality at most (1 + 2/7)" ™ |28 1 and 7 > to be chosen later, we define 

9 = — min ||Sz||?'. 

Then for every z £ S, there exists z' S S2 such that ||z — z'||2 < 7. We have 

\\Bz\\P > \\Bz'\\P - \\B{z - z')\\P >9n- 7^c^axn, (30) 
where the first inequality follows from triangular inequality and the second inequality follows from the 



definition of Cmax- Since (30 1 holds for every z in S, we have 

Cmin >0- 7^Cmax- (31) 



To calculate Amin(aiP)> we essentially need to characterize Cmin- From (31), we can achieve this by 
characterizing 6 and Cmax- 

We first show that there exists constant b > such that with overwhelming probability, 9 > b holds, 
i.e. ||-Bz||p > bn for all z in S2. 

P{0 <b) = P{3z G E2 s.t. \\Bz\\P < bn) 

< Y.P{\\Bz\\P<bn) 

< (l + 2/7)"-'"e**"S[e-*^'l^'^ln, Vt > 
= (l + 2/7)(i-")"e*^"S[e-*l^l']", Vt > 

^ g{(l-a)log(l+2/7)+log(i?[e-*l-^l''])+bt)n^ Vt > 0, (32) 

where X ~ M{0, 1). The first inequality follows from the union bound and the fact that P(||i?z||p < bn) 
is the same for all z G S2 since B has i.i.d. M{0, 1) entries. The second inequality follows from the 
Chernoff bound. Note that 



poo 

Jo 

J roc 1 

= r-^^2/TT / e-y"e—^^' "y^'dy. (33) 
Jo 

Jo 

= r?727^r(i/p)/p, (34) 



where ( 33 1 holds from changing variables using x = t py, and the inequality follows from the fact that 
g-'^(t^pyr < 1 for all y > 0. If it further holds that t > 1, then t'p < 1. Then from (33) we have 



^0 



dy. 



Since e ^'^^ dy exists and is positive, then combining (|34|) and (|35|), we have 



i?[e-W] = 0(rr 



(35) 



Since (32) holds for all t > 0, we let t = 7-^(1-°+^) for any e such that < e < a and let 6(7) = l/t, 



then from ( 32 ) we have 



where ^(7) = — (1 — a) log(l + ^) — log(0(7^^"^'^)) — 1. Note that since e > 0, when 7 is sufficiently 
small, ^(7) > 0. Therefore when 7 < ^ for some small ^ > 0, there exists constant ^(7) > such that 

P{e < b{-f) = 7P{i-°+^)) < e-'=(T)'^. (36) 

We next show that there exists some Xmnxict,?) > such that with overwhelming probability, Cmax < 
Amax(a,p) holds. In fact, we have the following Lemma: 

Lemma 7. Given any a and p, there exists a constant Xmiaiioi^p) > and some constant C12 > such 
that with probability at least 1 — e"'^"", for every z £ S, ||-Bz||p < X^a_x{a,p)n. 



Lemma [t] indicates that there exists Xmax{ci,p) and C12 > such that 

P(Cmax < Amax(a,p)) > 1 " e-"''"" . 



(37) 



Please refer to the Appendix for the calculation of Xmax{o(,p), and Lemma [TJis proved in the meantime. 
In order to obtain a good bound of recovery threshold, we want Ainax(o,p) to be as small as possible 
while Lemma [t] still holds. The numerical value of Aniax(«iP) can be computed from (50 1. 
Then after characterizing 6 and c^ax separately, we are ready to characterize Cmin- 

P(Cmin < 7^^'-°+^) - 7^A„,ax(a,p)) 

< P{e - -fPc^,^ < 7^(1-"+^) - 7PA^ax(a,p)) 

< P{e < 7^(1-°+^)) + P(c^,, > A^ax(a,p)) 



where the first inequaUty follows from (31 1, and the last inequality follows from ( [36] ) and (37). Then for 
any 7 < there exists constant cg > such that P(cmin < — 7^Ainax(a,p)) < e"'^^". Given 

Amax(a,p), let 

Amm(a,p) = max 7^(1-"+^) - ^P\raUa,p). (38) 

0<7<? 

Note that since 1 — a + e < 1, 7^(1""+^) — ^'^Xjaax > when 7 is sufficiently small, therefore Amin > 0, 
and Lemma |5] follows. 

2) Calculation of p*(a,p) in Lemma^ 

For any given set T C {1, 2, n} with |r| = pn {0 < p < 1), define 

dmax = -max||SrzE. 
n ze-s P 

Given a 7-net Ss of 5 with cardinality at most (1 + 2/7)"""^ and 7 > to be chosen later, define 

r = — max ||i?rz||J?. 
n zeSs ^ 

Then for every z G 5, there exists z' € S3 such that ||z — z'||2 < 7. Then for every z G 5, we have 

II^Tzll^ < \\BTz'fp + \\BT{z-z')fp < Tn + jPd^^^n. Thus, 

dmax<r/(l-7^). (39) 

Given Amin(a,p) (denoted by Amin here for simplicity), in order to obtain p*{a,p) such that Lemma[6] 
holds, we essentially need to find p such that for any T with its corresponding dmax, with overwhelming 



probability dmax < Amin/2 holds for all T with |T| = pm at the same time. From (39), we first consider 
the probability that r > Amin(l — 7^)/2 holds for a given set T. 

Pir > Amin(l - 7^)/2, given T) 
= P{3z G S3 s.t. II^Tzll^ > Amin(l - 7'')n/2) 

< (1 + 2/7)"-"^ min e-*^™(i-^'')-/2i?[e*^^eT |b.z|>'] 

t>0 

= (1 + 2/7)(^-")"mine-*^-(i-^'')"/2^[e*l^l'']^" 

= e ^ '>° , (40) 



where X ~ AA(0, 1), the first inequality follows from the union bound and the fact that the second 



inequality follows from the Chemoff bound. Note that since B has i.i.d. A/'(0, 1) entries, (40) holds for 
any T as long as \T\ = pn. 

Given p, Amin and 7, since the second derivative of /jlog(£'[e*l^l'']) — tAmin(l — 7^)/2 to t is positive, 
then its minimum is achieved where its first derivative is 0. 

d[p\og{E[e'\^\''])~t\^^{l-Y)/2] 







dt 

00 

e'-^-\-'dx)-tX^^{l-Y)/2) 



00 A_(l-7-)/2. (41) 



Note that when p < Amin(l - 7^)/(2-E'[|-'^|^]), the solution of t to (41) is always positive, thus it 
is also the solution to mint>o(plog(-E'[e*l^l'']) — tAmin(l — 7^)/2)- Now consider the probability that 
II-BtzP > ^Amin'^ for somc z G 5 and T with |r| = pn. 

P{3z G S,3T s.t. |r| = pn, \\Btz\\p > Xminn/2) 

< {^p^Pi^^ e ^ s.t. \\Btz\\p > Xminn/2, 
for given T C {1,2, ...,n} and |r| = pn) 

Tl \ 

pnj 

< ( V(^/(l - 7^) > Amin/2) 



pn^ 
n 

pn^ 

nH{p) ({l-a)log(l+2/7)+min(plog{i<;[e*l^in)-tA„„„(l-7^')/2))n 



P{t > Amin(l - 7^)/2) 



{H(p)log2+(l-a)log(l+2/7)+min(plog(ii;[e'l^l''])-a„.„(l-7'')/2))n 

= e *>° , (42) 

where the first inequality follows from the union bound and the second inequality follows from ( [39| ). 
Note that given a, p, and Xmin, for every 7, as p — )■ 0, H{p) goes to 0, and min(plog(i?[e*l^l'']) — 
iAmin(l — 7^)/2 goes to —00, thus, there exists p{a,p,^) > such that the exponent of (42) is negative 



for all p < p{a,p,^). In other words, for each 7, there exists some cio > such that (42 1 < e 
when p = p{a,p,j). Then, with probability at least 1 — e"'^"", for every z £ S and for every set 
T C {1, 2, n} with |r| < p{-f)n, \\Btz\\p < Aminra/2. Let 

p*{a,p) = maxp(a,p,7), (43) 

7 



then Lemma [6] follows. 

Theorem I?] establishes the existence of p*{a,p) > for all < q < 1 and < p < 1 such that ip- 
minimization can recover all the p*(a,p)n-sparse vectors with overwhelming probability. We numerically 
calculate this bound by calculating first Xma.x{ct,p) in Lemma|7]from (50l, and then Amin(aiP) in Lemma 



[5] from (38 1, and finally p*{a,p) in Lemma [6] from (43 1. Fig. [2] shows the curve of p*{a,p) against a 
for different p, and Fig. |3] shows the curve of p*{a,p) against p for different a. Note that for any p, 
liuia^i p*{a,p) is slightly smaller than the limiting threshold of strong recovery we obtained in Section 



III-A For example, when p = 0.5, the threshold p*{0.5) we obtained in Section III-A is 0.3406, and the 



bound p*{a, 0.5) we obtained here is approximately 0.268 when a goes to 1. This is because in Section 



III-A we employed a finer technique to characterize the sum of the largest pn terms of n i.i.d. random 



variables directly, while in Section IV-A introducing the union bound causes some slackness. 

Compared with the bound obtained in [4] through restricted isometry condition, our bound p*(a,p) is 
tighter when a is relatively large. For example, when p = I, the bound in [4] (Fig.3.2(a)) is in the order 
of 10~^ for all a G (0, 1) and upper bounded by 0.0035, while p*{a, 1) is greater than 0.0039 for all 
a > 0.8 and increases to 0.1308 as a — 1. When p = 0.5, the bound in [4] (Fig.3.2(c)) is in the order 
of 10^^ for all a G (0, 1) and upper bounded by 0.01, while here p*{a, 0.5) is greater than 0.011 for all 
a > 0.65 and increases to 0.268 as a — )• 1. Therefore, although Q provides a better bound than ours 



when Q is small, our bound p* improves over that in |4| when a is relatively large. 1 15 1 applies geometric 
face counting technique to the strong bound of successful recovery of £1 -minimization (Fig.1.1). Since 
if the necessary and sufficient condition (|4]) is satisfied for p = 1, then it is also satisfied for all p < 1, 
therefore the bound in [17] can serve as the bound of successful recovery for all < p < 1. Our bound 



p*{a,p) in Section IV is higher than that in 1 15| when a is relatively large 



B. Weak Recovery 

Theorem [3] provides a sufficient condition for successful recovery of every pn-sparse vector x on one 
support T with one sign pattern, which requires ||-Bt-z||p < ||i?T'=z||p to hold for all non-zero z G 7^", 
where given z, = {i : BiZXi < 0}. Given a, p and p G (0, 1), we will establish a lower bound of 
||i?r<=z||p for all z G 5 in Lemma [sj and estabUsh an upper bound of ||i3T-z||p in Lemma |9] If there 
exists plj{a,p) > such that the corresponding lower bound of ||i?j"^z||p is greater than the upper bound 
of ||i?2--z||p, which in fact is always true as we will see in Theorem [sj then p*^{a,p) serves as a lower 
bound of recovery threshold of £p-minimization for vectors on a fixed support with a fixed sign pattern. 




a 



Fig. 2. p*{a,p) against a for different p 
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Fig. 3. p''{a,p) against p for different a 



The technique to estabUsh the lower bound of ||i?rcz||p for all z G 5 is the same as that in Lemma [s] 
We state the result in Lemma [8] please refer to the appendix for its proof. 

Lemma 8. Given a, p and set T C {!,..., n} with \T\ = pn, with probability at least 1 — e~'^^^^ for 
some Ci3 > 0, for all z £ S, ||i?Tcz||p < (1 — p)Amax(x5^)P)'T'. <^nd with probability at least 1 — e""^^*" 
for some cu > 0, for all z £ S, ||i3rcz||p > (1 — p)Xramij^,p)n, where Amax(o,p) and Anim(a,p) 



are defined in (|50|) and {38) respectively. 



Given T with |T| = pn, Lemma [s] provides a lower bound of ||St<=z||p which holds with overwhelming 
probability for all z £ S. Please refer to the Appendix for its proof. Next we will provide an upper bound 
of ||St-z||p for all z G 5 in Lemma [o] One should be cautious that the set varies for different z. To 



improve the bound of the threshold of successful weak recovery, we want Xmax{ct,p, p) to be as small 



as possible while Lemma |9| still holds. \ma,x{o(,P, p) can be computed from (57i, please refer to the 
Appendix for its detailed calculation. 

Lemma 9. Given a, p and set T C {!,..., n} with \T\ = pn, with probability at least 1 — e~'^^^^ for 
some ci5 > 0, for every z e S, \\Bt-z\\p < pX,^g,^{a,p, p)n, for some Xma.x{o!,p, p) > 0. 

With the help of Lemma [8] and Lemma [9} we are ready to present the result regarding the lower bound 
of recovery threshold via ^p-minimization in the weak sense for given a. 

Theorem 8. For any < p < 1, for matrix ^4™^" with i.i.d A/'(0, 1) entries, there exists constant 
pl^{a,p) > and cig > such that with probability at least 1 — e""^^"", x is the unique solution to 
the £p-minimization problem Q /or every p'^{a,p)n-sparse vector x on one support T with one sign 
pattern. 

Proof: Note that given p and a, since Amax(a7P;P) and Aminlx^jP) ^ot\\ positive for all 
p G (0,1), and one can check from the definition of \msix{ci,p, p) and Amm(x^,p) that when p 
decreases, Xuiax{ot,p, p) is non-increasing, and Amin(Y5^,p) is non-decreasing. Therefore, there always 
exists plij{a,p) > (denoted by /o^ for simplicity here) such that 

(y. * 

plXms.x{a,p,p*J < (1 - P^)Amin( _ ^ ,P)- (44) 

Now consider the probability that £p-minimization can recover all the p^n-sparse x on one fixed support 
T with one fixed sign pattern. From Theorem |3] we know that ||i?r-z||p < ||Bt<:z||p for all non-zero 
z G 7^"-™ is a sufficient condition for the success of weak recovery, thus 

P(Weak recovery succeeds up to p^n-sparse) 

> P(V non-zero z G 7^"-™, ||5t-z||p < ||St=z||P 
= P(yzeS,\\BT-z\\P<\\BToz\\Pp) 

> P(Vz G S, \\Bt-z\\p < p*^Xjj,i,^{a,p,pl,), and 
\\BT^z\\P>il-p*JX^U^^,p)) 

> 1 - e-'^"" - e""^^", (45) 

where the equality holds since for any non-zero z G 7^"~"^, z/||z||2 G S, and the second inequality follows 
from (44 1. From Lemma [s] we know there exists cu > such that P(||i?T"^z||p > (1 — /9^)Aniin(l — 
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Fig. 4. p'^{a,p) against a for different p 



jz^^p))^^ — ^ and from Lemma |9 



we know there exists cis > such that P(Vz G S, \\Bj^-z\\p < 
PiP*w)) > 1 - e then (|45| holds. Thus, there exists cie > such that with probabiUty 
at least 1 — e"'^^''", ^^-minimization problem can recover all p^n-sparse vectors on fixed support T with 
fixed sign pattern. ■ 
Theorem [S] establishes the existence of a positive bound p^{a,p) and defines p^(a,p) in (44i. To 



obtain p^(a,p), we first calculate Xminij^iP) in Lemma |8j from (38l and Xma.x{o(,p, p) in Lemma 
from ( [57] ) for every p, then find the largest p^{a,p) such that ( [44| ) holds. We numerically calculate this 
bound and illustrate the results in Fig. |4] and Fig. |5] Fig. |4] shows the curve of p*^{a,p) against a for 
different p, and Fig. [S] shows the curve of p^(a,p) against p for different a. When a — 1, p^(a,p) 



goes to 2/3 for all p E (0, 1), which coincides with the limiting threshold discussed in Section III-B As 
indicated in Fig. 1.2 of |[18|, the weak recovery threshold of ^1 -minimization is greater than 2/3 for all a 
that is greater than 0.9, since the weak recovery threshold of ^p-minimization {p G [0, 1)) when a — )• 1 
is all 2/3, therefore for all a > 0.9, the weak recovery threshold of ^1 -minimization is greater than that 
of £p-minimization for all p € [0, 1). 

V. .^1 -MINIMIZATION CAN PERFORM BETTER THAN .^p-MINIMIZATION (p G [0, 1)) FOR SPARSE 

RECOVERY 

For strong recovery, if ^1 -minimization can recover all the fc-sparse vectors, then ^p-minimization is 
also guaranteed to recover all the fc-sparse vectors for all p G [0, 1). However, this does not necessarily 
indicate that the performance of £p-minimization (0 < p < 1) is always better than that of £1 -minimization. 



Example 1 in Section II-B indicates that sometimes ^1 -minimization can successfully recover the original 
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Fig. 5. p'^{a,p) against p for different a 

sparse vector while £p-minimization (p € (0, 1)) would return a vector that is denser than the original 
vector. Moreover, our results for weak recovery indicates that the performance of £i -minimization is 
better than that of ^p-minimization for all p G [0, 1) in at least the large a region (a > 0.9). 

We can roughly interpret the result as follows. Let a < 1 be very close to 1, let n be large enough 
and A is a random Gaussian matrix. Then with overwhelming probability £i -minimization can recover 
all the vectors up to pin-sparse and ^p-minimization with some p G [0, 1) can recover all the vectors 
up to p2"'-sparse, and we know pi < p2 from our discussion on strong bound. Note that since the 
limiting threshold of strong recovery via ^p-minimization increases to 0.5 as p goes to 0, then we have 
Pi < ^2 < 0.5. However, if we only consider the ability to recover all the vectors on one support 
with one sign pattern, with overwhelming probability -minimization can recover vectors up to p^n- 
sparse, while £p-minimization can recover vectors up to p4n-sparse. From previous discussion about 
weak recovery threshold, we know that when a is very close to 1, > | > p4 > |. Therefore we 
have ps > P4 > P2 > Pi- We illustrate the difference of ii and ^p -minimization in Fig. [6] and Fig. [7] 
Let i7 be the set of all m x n matrices with entries drawn from standard Gaussian distribution, and the 
probability measure P{^) = 1. We pick p G (pi,P2) in Fig. [6] For a random measurement matrix A 
in Q, since p < p^, for any fixed support T with \T\ = pn and any fixed sign pattern Uj, with high 
probability £i -minimization can recover all the pn-sparse vectors on Tj with sign pattern aj. Since we 
also have p > pi, then with high probability strong recovery of -minimization fails, in other words, 
£i -minimization would fail to recover at least one vector with at most pn non-zero entries. In Fig. [6] 
(a), E^^ denotes the event that £i -minimization can recover all the pn-sparse vectors on support Tj with 



sign patter aj. Then P{E'^[) is very close to 1 for every i and j. There are (^) different supports, and 
for each support, there are 2^^ different sign patterns. Let E denote the event that £i -minimization can 
recover all the pn-sparse vectors, then we have 

ie{i,...,(;J},jG{i,...,2''"} 

Then although P{E^' ) is the same for all i and j and is very close to 1, P{E) is close to 0, as indicated 
in Fig. [6] (a). For £p-minimization, since p < p2, then with high probability, ^p-minimization can recover 
all the pn-sparse vectors. In Fig. [5] (b), E denotes the event that £p-minimization can recover all the 
pn-sparse vectors, then 

E= n 

ie{i,...,(;J}je{i,...,2p"} 

where E^^^ denotes the event that ^p-minimization recovers all the vectors on support Tj with sign pattern 
CTj. In this case, P{E) is close to 1 as indicated in Fig. [6] (b). In Fig. |7| we pick p E (^3,^4). Then 
given any i and j, £1 -minimization can recover all the vectors on Tj with sign pattern aj with high 
probability, while £p-minimization fails to recover at least one vector on Tj with sign pattern aj with 
high probability. Therefore P{E!p) is close to 1, while P{E^^) is close to for any given i and j. 
Therefore, if the sparse vectors we would like to recover are on one same support and share the same 
sign pattern, ^1 -minimization can be a better choice than £p-minimization for all p G [0, 1) regardless of 
the amplitudes of the entries of a vector. 




(a) ^i-minimization (b) ^p-minimization 

Fig. 6. Comparison of £1 and -minimization for p £ {pi,p2). 

To better understand how the recovery performance changes from strong recovery to weak recovery, 
let us consider another type of recovery: sectional recovery, which measures the ability of recovering all 



(a) i?i -minimization (b) ^p-minimization 

Fig. 7. Comparison of l\ and £p -minimization for p G {p-j,p4). 



the vectors on one support T. Therefore, the requirement for successful sectional recovery is stricter than 
that of weak recovery, but is looser than that of strong recovery. The necessary and sufficient condition 
of successful sectional recovery can be stated as: 

Theorem 9. x is the unique solution to £p-minimization problem (p £ [0, 1]) for all pn-sparse vector x 
on some support T, if and only if 

WBtAI < \\Bt^z\\p (46) 

for all non-zero z G 7^"~™. 



The difference of the null space condition for strong recovery and sectional recovery is that (46 1 
should hold for every support T for strong recovery, but only needs to hold for one specific support T 
for sectional recovery. Though for strong recovery, if the null space condition holds for p G [0, 1], it also 
holds for all g G [0,p], this argument is not true for sectional recovery. Consider a simple example that the 
basis B of null space of A contains only one vector in TZ^ and T = {1, 2}. If B = [16, 16, 1, 36], then one 
can check that \\Bt\\i = 32 < 37 = \\Bt^\\i, but ||St||[]J = 8 > 7 = Pt'^ If B = [1,4,1,9], then 
||-Bt||i < ||^T'^||i> and ||5t|Io'5 < I|-Bt'=||o'5- Therefore the null space condition of successful sectional 
recovery holds for p does not necessarily imply that it holds for another q p. 

Following the technique in Section III-B[ one can show that when a — )• 1 and n is large enough, the 



recovery threshold of sectional recovery is 1/2 for all p G [0, 1]. We skip the proof here as it follows 



the lines in Section III-B To summarize, regarding the recovery threshold when a ^ 1, ^p-minimization 
{p G [0, 1]) has a higher threshold for smaller p for strong recovery; the threshold is all 1/2 for all 
p G [0, 1] for sectional recovery; and the threshold is all 2/3 for p G [0, 1) and 1 for p = 1 for weak 



recovery. We can see how recovery performance changes when the requirement for successful recovery 
changes from strong to weak. 



VI. Numerical Experiments 



We present the results of numerical experiments to explore the performance of ^p-minimization. As 
mentioned earlier, ([3]) is indeed non-convex and it is hard to compute its global minimum. Here we 
employ the iteratively reweighted least squares algorithm |11 1|12| to compute the local minimum of ([3]), 



please refer to |12| about the details of the algorithm. 
Example 2. ^p-minimization using IRLS | |I2| 

We fix n = 200 and m = 100, and increase p from 0.01 to 0.5 as a percentage of n. For each 
p, we repeat the following procedure 100 times. We first generate a n-dimensional vector x with pn 
nonzero entries. The location of the non-zero entries are chosen randomly, and each non-zero value 
follows from standard Gaussian distribution. We then generate a m x n matrix A with i.i.d. J\f{0, 1) 
entries. We let y = A-k and run the iteratively reweighted least squares algorithm to search for a local 
minimum of Q with p chosen to be 0.2, 0.5, and 0.8 respectively. Let x* be the output of the algorithm, 
if ||x* — x||2 < 10~^, we say the recovery of x is the successful. Figure [s] records the percentage of 
times that the recovery is successful for different sparsity pn. Note that the iteratively reweighted least 
squares algorithm is designed to obtain a local minimum of the £p-minimization problem and is not 
guaranteed to obtain the global minimum. However, as shown in Figure [8j it indeed recovers the sparse 
vectors up to certain sparsity. For £o.2> ^0.5 and £o.8-nunimization computed by the heuristic, the sparsity 
ratios of successful recovery are 0.025, 0.024, and 0.015 respectively. 
Example 3. Strong recovery vs. weak recovery 

We also compare the performance of £p-minimization and £i -minimization both for strong recovery in 
Fig. |9]and for weak recovery in Fig. 10 when a is large. We employ CVX |24| to solve £i -minimization 
and still employ the iteratively reweighted least squares algorithm to compute a local minimum of ip- 
minimization. We fix n = 50 and m = 48 and independently generate one hundred random matrices 
j^mxn ^jjj^ j M{0, 1) entries and evaluate the performance of strong recovery and weak recovery. 
For each matrix, we increase p from 0.04 to 1. In weak recovery, we consider recovering nonnegative 
vectors on support T = {1, pn}. For a given p, we generate one hundred and fifty vectors and claim 
the weak recovery of pn-sparse vectors to be successful if and only if all the vectors are successfully 
recovered. For each vector x, Xi (i £ T) is generated from A/'(0, 1) with probability 0.5, and AA(1000, 1) 
with probability 0.5. As discussed in Section [III the condition for successful weak recovery via li- 
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Fig. 8. Successful recovery of pn-sparse vectors via ^p-minimization 




Fig. 9. Successful strong recovery of pn-sparse vectors 
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Fig. 10. Successful weak recovery of pn-sparse vectors 



minimization is the same for every nonnegative vector on T, therefore if -minimization recovers all the 
vectors we generated, it should also recover all the nonnegative vectors on T. £p-minimization (p G [0, 1)), 
on the other hand, can recover some nonnegative vectors on T while at the same time fails to recover 
some other nonnegative vectors on T. Therefore, since we could not check every nonnegative x on T, 
£p-minimization (p < 1) can still fail to recover some other nonnegative vector on T even if we declare 
the weak recovery to be "successful". In strong recovery, for each p, we generate two hundred vectors 
and claim the strong recovery to be successful if and only if all these vectors are correctly recovered. 
To generate a pn-sparse vector x, we first randomly pick a support T with |T| = pn. For each Xi 
(i G T), Xi is generated from AA(0, 1) with probability 0.5, from AA(1000, 1) with probability 0.25, and 
from A/'(— 1000, 1) with probability 0.25. The average performance of one hundred random matrices for 
strong recovery is plotted in Fig. [9j and the average performance of weak recovery is plotted in Fig. 10 



Note that we only apply iteratively reweighted least squares algorithm to approximate the performance 
of ^^-minimization, therefore the solution returned by the algorithm may not always be the solution of 
£p-minimization. Simulation results indicate that for strong recovery, the recovery threshold increases 
as p decreases, while for the weak recovery, interestingly, the recovery threshold of £i -minimization is 
higher than any other -minimization for p < 1. 

VII. Conclusion 

This paper analyzes the ability of £p-minimization (0 < p < 1) to recover high-dimensional sparse 
vectors from low-dimensional linear measurements where the measurement matrix A^^^ has i.i.d. 
standard Gaussian entries. When a = m/n — )• 1, we provide a tight threshold p*{p) of the sparsity ratio 
separating the success and failure of strong recovery which requires to recover all the sparse vectors. 
p*{p) strictly decreases from 0.5 to 0.239 as p increases from to 1. For weak recovery which only 
needs to recover sparse vectors on some support with some sign pattern, we first provide an equivalent 
null space characterization of successful weak recovery, then prove that the threshold of sparsity ratio 
separating the success and failure of £p-minimization is 2/3 for all p < 1, compared with the threshold 1 
for ^1 -minimization. For any a < 1, we provide a bound p*{a,p) of sparsity ratio below which strong 
recovery via ^p-minimization succeeds with overwhelming probability, and our bound p*{a,p) improves 
on the existing bounds in the large a region. We also provide a bound p'^{a,p) of sparsity ratio below 
which weak recovery succeeds with overwhelming probability. 

Throughout the paper, we assume that the measurements y = ^x are exact, and it would be interesting 
to consider the case that the measurements are noisy, i.e. y = Ax. + e where e is the vector of noise. 



Moreover, we assume that x is exactly sparse, i.e. most of its entries are exactly zero. The extension 
of results to approximately sparse vectors whose coefficients (if ordered) decay rapidly is also worth 
pursuit. 
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Appendix 

A. Calculation of \^s.x{a,p) in Lemma 7 

Define Cmax = ^ maxzg^ ||i?z||p, then for any non-zero vector z, ||i?z||p < llzllpCmax"-- Let Ei be a 



7-net of S with cardinaUty at most (1 + 2/7)" ™ |28| and 7 > to be chosen later, and define 

r? = — max ||i?z||J'. 
nzeSi ^ 



Then from the definition of 7-net, for every z E 5, there exists z' G Si such that ||z — z'||2 < 7. Note 
that for every z e S, ||-Bz||p < ||-Bz'||p + \\B{z — z')\\p < ryn + j^Cmeixn. Then Cmaxf^ < rjn + 7^Cmax?T-, 
which leads to 

Cmax<??/(l-7n- (47) 

To characterize Cmax> we first characterize -q. We will show that there exists a constant a > E[\X\p] 
where X ~ M{0, 1) such that with overwhelming probability, ||-Bz||p < an for all z in Si. Given z G 5, 
BiZ {i = 1, n) are i.i.d. J\f{0, 1) random variables where Bi is the row of B. Then 

Piv >a) = P{3z G Sis.t. \\Bz\\P > an) 

< ^P{\\Bz\\P>an) 
zeSi 

< (l + 2/7)"-"^mine-*''"S[e*^'l^'^l'l 

= (l + 2/7)(i-")"mine-*""E[e*l^ln" 
t>o 

^ g({l-a)log(l+^)+min,>o(log(i<;[e*l^l''])-ai))n^ ^^g^ 

where X A/'(0, 1), the first inequality follows from the union bound, and the second inequality follows 
from the Chemoff bound. 

Since the second-order derivative of log(ii^[e*l"''-l'']) — at to t is positive, then its minimum is achieved 
where its first-order derivative is 0. To calculate the value of t where the minimum is achieved, we have 

d[log{E[e'\^\'']) - at] 







dt 



00 



a. (49) 



Note that when a > E[\X\p], the solution of t to (49 1 is always positive, thus it is also the solution to 



minf>o(log(-E'[e*l^l'']) — at). One can check that for any 7, the exponent in (48 1 is negative when a is large 
enough. To see this, let t = 2(l-a) log(l+2/7)/a, then log(£'[e*l^l''])-at goes to -2(l-a) log(l+2/7) 
as a goes to infinity. Thus, when a is sufficiently large, log(£;[e*l^l'']) - at < -(1 - a) log(l + 2/7) if 



t = c/a. Therefore, the exponent in (48 1 is negative when a is large enough. Thus, we can pick a{a,p, 7) 
large enough such that there exists some constant C12 > and P{7] > a{a,p,j)) < e^^^^^" holds. Then 

P(,„„ > '±iPill) < > < 

X — 7P i — I — 7P 



where the first inquaUty follows from (47). Let 



Amax(a,p) = m.ma{a,p,j)/{l - 7^), (50) 

7 

then there exists 012(0, Amax) > such that with probability at least 1 — e~^^^", for every z G 5, 
||i?z||p < Amax^- Thus, Lemma 7 follows. 

B. Proof of Lemma 8 

Proof: Define c'^^^x = {i-p)n ™^^zg-S Let S4 be a 7-net of S with cardinality at most 

(1 + 2/7)""™ and 7 being the value where y^ma.x{jz^,p) is achieved, and define 

ri' = r — max ||i?z||?!. 

(l-p)nzeS4 ^ 

Then same as that in the calculation of Amaxla^p) in Appendix-[A] we have 

4ax<^//(l-7^). 

We use Amax to denote AmaxlyE^iP) for simplicity. We first show that with overwhelming probability, 
||i?T"=z||p < (1 — /o)Amax''^ for all z in S, or equivalently c'^^ax < Amax- Note that 

^('-max — Amax) 

< P(r?7(l - 7^ ) > Amax) 

= P(3z G S4 s.t. \\BtM\1 ^ (1 - P) Vax(l - 7^)^) 

< J]P(||i3TeZ||^>(l-p)Amax(l-7» 

zeS4 

< (l + -)"-™min- ^ ^ 



7^ t>o e*(i-'')^»-''(i-'>"')" 



^ 7'^ t>o e*(i-^')^"'-(i-T"')" 

(1-P)"(i5f log(l+^)+min{log(i?[e'^^l''])-A.„,„(l-7P)t)) 

p J- f r t>0 



(51) 



where X ~ A/'(0, 1). From the definition of Amax(x^,p), and that 7 is chosen to be the value where 
Amax(Y3f 'P) i^ achieved, we know that there exists C13 > such that (jsTJ < e""^"". Therefore it holds 
with probability at least 1 — e~'^^^^ that for all z £ S, ||i?Tcz||p < (1 — /9)Amax?^- 

Similarly, define c[^;^ = ^^3^ min^g^ ||i?z||p. Let S5 be a 7-net of 5 with cardinality at most 
(1 + 2/7)""™ and 7 being the value where Xmm{jii^,p) is achieved, note that 

Amm(^,^.)=7^^^^^^-7-Amax(^,p) 
1-p 1-p 



for some e G (0, j^) according to the definition of Xmin{j:i^,p)- We use Amm and Amax to denote 
Amin(f^,p) and Amax(f^,p) for simplicity. We define 

B' = . ^ . min \\Bt^z\\p. 
(1 — p)n zeSs ^ 



Like in the calculation of Amin(a,p) in Section IV-Al we have 



c' . > e' - -i^c 

"'mm — ^ I ''max' 

We next show that with overwhelming probability, ||i?r<=z||p > (1 — /o)Aminn for all z in S, or 
equivalently c^j^^ > Amin- Note that 

-P(Cjnin — '^min) 
= P(4m<7"^^+^^-7"A^ax) 

< P(0'-^f4^^</(^+^)-^fA^,,) 

< P(0'<7"^^+^^)+P(4ax>Amax) 

< < 7^^^+'^) + 6-^="", (52) 
where the last inequality follows from ( [si] ). To calculate P{0' < 7^^!^^*^^), note that 

P{9' < 7^(1^+^)) 

= P{3z G S5 s.t. \\Bt^z\\p < (1 - p)Y^'^^"^n) 

< ^P{Y^ \Biz\P < (1 - p)7P(T^+^)n) 
ZGS5 ieT= 

7 

= e^^"'')"^ W log(l+f )+log(-E[e--"'"^''"l^l''])+l) 

= g(l-p)n(ief log(l+f)+log(0{7^+'))+l)^ (.53) 

where X ~ A^(0, 1), the second inequality follows from the Chernoff bound, and the last equality 



follows from (35 1. Since 7 is chosen to be the value where Amin(xz^,p) is achieved, then according to 
the definition of Amm(Yi-^,p), (53) < e"'^" for some positive k > 0. Thus, from (52i we have 



P(4m < Amin) < e""" + e"'^^^" < e"^-", 
for some cu > 0. Then, with probability at least 1 — e~'^^*", for all z e S, WBxfzWp > (1 



C. Calculation of Xma.x{oi,p, p) in Lemma 9 

Proof: Define Cmax = ^ maxzg^ USt-zHp- Let Eg be a 7-net of S with cardinality at most (1 + 
2/7)" and 7 > to be chosen later, and define i) = maxzgs^ ||i3j'-z||p. Then from (25 1, for any 



z G 5, z = ^j>Qlj^j hold, where 70 = 1, 7j < 7-' and Vj G Sg. From (26 1 we have 

i>0 ieT:(B,Vj)xi<0 

j>o 

< fipn/{l-Y) (54) 



Since (54i holds for every z G 5, then Cmaxpn < fipn/{l — 7P), which leads to Cmax < — 7''')- 
Define a random variable Si for each i in T that is equal to 1 if BiZXi < and equal to otherwise. 
Then ||i?r~z||p = J2ieT l^izl^Si. Then for any a, 

Pic.n..>r^J<P{r^> ^ ' 



= P{v >a)= P{3z G Se s.t. \\Bt-z\\p > apn) 

< J] P(||St-z||p> apn) 

zeSe 

= {l + ^r-"^PiY\Biz\PS,>dpn) 

< (l+^)(-)"l^^^ 
- ^ 7^ t>o e*"''" 

_ ^((l-a) log(l+:^)+pmint>o(log(ii;[e'l^l''S])_at))n 



(55) 



where X ~ 7V(0, 1), 5 = 1 if X < and 5 = otherwise. 

Since the second derivative of \og{E[e^^^^''^]) — at to t is positive, then its minimum is achieved where 
its first derivative is 0. To calculate the value of t where the minimum is achieved, we have 

d[log{E[e'\^\'^]) - at] 







dt 



a. (56) 



Note that when a > E[\X\pS], the solution of t to (56 1 is always positive, thus it is also the solution 
to mint>o(log(-E'[e*l^l''"^]) — dt). Given any p and 7, when a is large enough, the exponent in (55 1 is 



negative. We can pick a{a,p, p, 7) as small as possible while still keeping the exponent in (55 1 negative. 
Let 

Amax(a,P,p) = mm — — , (57) 

7 1 — ■jP 

then there exists C15 > such that with probability at least 1 — e"'^^^", Cmax < ^ma.x{c'(,p, p), or 
equivalently, for every z e S, WB^-zllp < (1 — p)\^a,^{oi,p, p)n. Thus, Lemma 9 follows. 



